Reparameterized multiobjective control of BCG immunotherapy

Bladder cancer is a cancerous disease that mainly affects elder men and women. The immunotherapy that uses Bacillus of Calmette and Guerin (BCG) effectively treats bladder cancer by stimulating the immune response of patients. The therapeutic performance of BCG relies on drug dosing, and the design of an optimal BCG regimen is an open question. In this study, we propose the reparameterized multiobjective control (RMC) approach for seeking an optimal drug dosing regimen and apply it to the design of BCG treatment. This approach utilizes constrained optimization based on a nonlinear bladder cancer model with impulsive drug instillation. We compare the performance of RMC with Koopman model predictive control (MPC) and validate the efficacy of optimal BCG dosing regimens through numerical simulations, demonstrating the efficient elimination of cancerous cells. The proposed control framework holds the potential for generalization to other model-based treatment designs.


Nonlinear bladder cancer model
In this study, a bladder cancer model 27,28 is employed, as depicted in Fig. 1.The model considers the impact of BCG vaccines on various populations, including uninfected tumor cells, infected tumor cells, and effector T cells.The administration of BCG vaccines to the tumor site converts uninfected tumor cells into infected tumor cells, which are subsequently targeted and eliminated by the effector cells.
Consider a general dynamic system with impulsive input: where X(t) ∈ R n s is the state vector that contains n s state variables, f X : R n s → R n s is a function of dynamics of X(t), t is the time index and t + refers to the time instant after t, B in ∈ R n s is the input coefficient vector, u in (t) is the input, and τ k ( k = 1, 2, ... ) is the time instant that the impulsive input is applied to the model.We use the BCG treatment model in the form of nonlinear differential equations (in Eq. 2 from 27 with impulsive inputs) that govern the concentrations of four states: BCG concentration (B) measured in units of 1 × 10 6 colony-forming units (c.f.u), activated immune system cell population (E) measured in units of 1 × 10 6 cells, infected tumor cell population ( T i ) measured in units of 1 × 10 6 cells, and uninfected tumor cell population ( T u ) measured in units of 1 × 10 6 cells.
where µ 1 and µ 2 are the decay rates of BCG and effector cells, respectively.p 1 − p 5 are the rates of transitions between cells.α is the infected tumor stimulation rate.β is the inverse of tumor carrying capacity.r is the tumor growth rate and u is the BCG dose.BCG is instilled into the tumor site, and optimal scheduling of dose u(t) is measured in units of 1 × 10 6 colony-forming units (c.f.u), is to be designed to eliminate T u .Then we have X(t) = [B(t), E(t), T i (t), T u (t)] T and B in = [1, 0, 0, 0] based on (1).Specifically, we have B(t + ) = B(t) + u(t) for t = τ k ( k = 1, 2, ... ), where τ k is the time instant that BCG is given.The description and numerical values of the parameters can be found in Table 1 27,28 .
The origins and sources of these parameters are referenced from 28 and are elaborated as follows: µ 1 was esti- mated through experimental work involving the cultivation of the Mycobacterium avium strain based on the (1) Ẋ(t) = f X (X(t)), t � = τ k X(t + ) = X(t) + B in u in (t), t = τ k , k = 1, 2, ...
experimental data in 29 ; r was derived from a median growth rate determined through an in vivo study 37 , where a logistic model for tumor growth was fitted to mammographic measurements from breast cancer patients using the least squares solution; β was computed under the assumption of a cylindrical tumor shape with a depth of 3 cell layers.A maximal diameter of 6.4 cm (reported by 36 ) is employed in the calculations; µ 2 , p 3 , and p 5 were estimated in 30 by fitting experimental data of chimeric mice with Bcl-1 lymphoma in their spleen 31 in the sense of nonlinear least squares, using the Hooke-Jeeves optimization method; p 1 was obtained in 32 as the max killing rate of bacteria by activated macrophages that can serve as Antigen Presenting Cells, based on experiments on Mycobacterium bovis BCG Substrains in 33 ; α was characterized as the maximum Th1/Th2 cell recruitment rate in 32 based on experiments about lymphocyte recruitment facilitated by Interleukin-8 in the inflammatory process in 38 ; p 2 was informed by 34 and 35 based on the internalization of BCG by incubating human bladder cancerous cell lines like T24 34 and the adhesion and internalization of BCG and effector cells 35 .Furthermore, p 2 and its numerical values are also reported in 39 and supported by clinical trial data 40 .The source for p 4 is not identified as informed by 28 .However, sensitivity analysis showed that p 4 minimally influences the output settling time, which will be shown later.
The model is discretized using the fourth order-Runge Kutta method, and the resulting discrete model has a state vector X(k) = [B(k), E(k), T i (k), T u (k)] T at time k with the output y k = y(k) = T u (k) and input u(k).The objective of this study is to optimize the scheduling of the dose u to eliminate the uninfected tumor cells T u .The BCG vaccine is administered at the tumor site using the dose u.The observed output of the immunotherapy model is assumed to be the population of uninfected tumor cells T u .If the system is observable, estimators can be used to obtain the values of other states.Based on guidelines from the Food and Drug Administration, each vial of the BCG vaccine contains 1 to 8 ×10 8 c.f.u, which is approximately 50 mg (wet weight).A previous study on BCG doses 7 suggests that the weekly dose ranges from 40 to 120 mg.Standard doses are typically 80 or 120 mg, which have shown better efficacy than lower doses (one-third to two-thirds of the standard dose).The treatment duration varies from 6 to 12 weeks.The scaling of the cell population is 1 × 10 6 cells for the dimensionless cell population values.

Reparameterized multiobjective control
Denote the sampling time as t and a unit dose of BCG as d (unit: 1 ×10 6 c.f.u).The time interval (or gap) between two consecutive doses is g (unit: days).The total number of treatments for the dosing scheme is N.
Then the cumulative drug doses can be formulated as d • N .The cumulative uninfected tumor cell population is k t k=0 T u (k) • �t , which is in the form of the area of the uninfected tumor cell population across time.The control objective is to design an optimal BCG dosing scheme to eliminate tumor cells when considering BCG cost.Thus, the objective function to be minimized comprises the cumulative population of uninfected tumor cells and cumulative BCG vaccines whose diagram is visualized in Fig. 2(a).
The terminal constraint on T u (k t ) is set to tumor-free condition T u = 0 at the end of the treatment (i.e., k = k t ), which is fixed as the maximal treatment period, i.e., k t = max(g) • max(N) = 100 .This constraint is relaxed with penalty weight w t .Besides, as the number of treatments has to be a positive integer within the boundary, a hard constraint N ∈ Z + is applied.To enhance the constraint's flexibility, we introduce relaxation through penalty terms in the objective function, controlled by penalty parameters.Specifically, we add the residual N − round(N) as a penalty term weighted by w N in the cost function.Furthermore, in the cost function, we normalize T u in both the stage cost and terminal cost using the maximal T u within the 100-day time horizon in the absence of BCG vaccines (i.e., u = 0).
The objective function is shown as follows where s(k) is a step function of time index k which is used to shape the input signals, as BCG vaccines are given to patients impulsively.The nonlinear model process f (x(k), u(k)) is the function of state vector x(k) (which contains T u (k) ) and input u(k).The penalty weight, denoted as , is employed to fine-tune the relative significance of two factors: cumulative tumor cells and drug concentration.A higher value of signifies a greater emphasis on administering fewer BCG vaccines, thereby permitting more cancerous cells to persist in patients throughout the treatment process.On the other hand, a lower value of prompts the controller to prioritize tumor elimination, even if it requires a relatively larger quantity of BCG vaccines to achieve this objective.As proposed by 28 , the dosage constraint applies to a single treatment.The parameter g is derived from the current weekly dosing scheme of BCG treatment, as outlined by the FDA.To provide a more flexible timeframe, the range of g has been slightly expanded from 5 to 10 days.Moreover, based on FDA guidelines, a weekly treatment scheme typically comprises a total of six treatments, spanning over 42 days.To accommodate this expanded range, we define the period as ranging from min(g) • min(N) to max(g) • max(N). ( min {d,N,g} The optimal parameter set that minimizes the objective function is to be searched by optimization algorithms.We first employ Particle Swarm Optimization (PSO) 42 .PSO iteratively updates the position P i (k) ∈ R 3 and veloc- ity V i (k) ∈ R 3 of each particle i using the following update equations: where the inertia weight constant w ∈ [0, 1] retains a portion of the particle's previous search direction and speed.The constants c 1 and c 2 assign weights to the impact of individual particles and all particles, respectively.The random numbers r 1 and r 2 are sampled from a standard uniform distribution U[0, 1] and help prevent premature convergence.p i best is the best position of particle i, and g best is the best position of all particles.The term (p i best − P i (k)) guides the particle back to its best position encountered so far, while (g best − P i (k)) brings the particle back towards the overall best position found among all particles.The flowchart of PSO is visualized in Fig. 2(b) 41 .
In addition, we employ two other metaheuristic optimization techniques (Ant Colony Optimization (ACO) 43 and Simulated Annealing 44 ) to compare their cost-associated optimal solutions to PSO using the same initial point and evaluate changes in the minimum cost function value and optimal dosing schemes.Simulated annealing probabilistically explores solution spaces by making random moves, guided by a probability parameter P 44 .Denote c as the difference in cost between the new and old solutions.If a move reduces cost ( �c < 0 ), it's accepted with certainty ( P = 1 ), while if it increases cost ( �c > 0 ), it's accepted with a decreasing probability ( P = e − c T ).The temperature T decreases geometrically over time ( Ant k selects an edge connecting vertices x and y using probability p k xy defined as . Here, ρ xy represents pheromone on edge xy, and η xy indicates move attractiveness, with α and β controlling their influence.Pheromone is updated as where ǫ is pheromone evaporation rate, m is the total ants, and �ρ k xy is pheromone deposited by ant k, equal to Q/L k if it traverses edge xy.

Koopman model predictive control
We employ the Koopman MPC method 23 on the BCG treatment model, incorporating impulsive control inputs.This approach, previously implemented in our study 45 , is utilized to design an optimal dosing scheme and serves as a basis of comparison against RMC.To capture the model's nonlinear behaviors, we employ Koopman linearization 46 .This method utilizes a finite subset of the infinite-dimensional Koopman operator ( κ ), mapping observable variables into a higher-dimensional space with new basis functions.This enables a linear transformation among observables 46,47 .
The extended state Z k ∈ R N is formed by combining observables and control inputs.Its dimension- ality is given by N = n y (n d + 2) + (n d + 1)n u 23 , where n d is the number of delay units, n y (set as 1) is the number of outputs, and n u is the number of inputs.With time series data spanning y k to y k+n t for out- puts and u k to u k+n t −1 for inputs over a total simulation time n t , the extended state at time step k is expressed as Z k = [y k+n t −1 , . . ., y k , u k+n t −2 , . . ., u k ] T .The one-step-ahead extended state is denoted as Z k+1 = [y k+n t , . . ., y k+1 , u k+n t −1 , . . ., u k+1 ] T .The dynamics of Z(k) are described by the Koopman operator κ : Here, the function f : R N → R N captures the dynamic behaviors of Z k , and φ : R N → K maps the lifted function space to Koopman space.In the infinite-dimensional space, the Koopman operator has a spectrum κφ = ∞ i=1 i m i e i , where i represents Koopman eigenvalues, m i ∈ K are weights, and e i are Koopman eigenfunctions 48 .We use Extended Dynamic Mode Decomposition 48 to approximate the Koopman operator.This involves lifting system observables over Radial Basis Functions (RBFs) to enhance accuracy.Specifically, thin plate splines RBFs B(Z k , C) ∈ R N rbf are used as suggested in 48 due to the irregular nature of the Koopman spectral analysis, where N rbf represents the number of RBFs.The lifted state ) and B lift is the input matrix ( B lift ∈ R (N+N rbf )×n u ).The Frobenius norm of the difference between the observables and the states generated by the finite-dimensional Koopman-linearized model is minimized using the least-squares solution, i.e., � (κφ where � • � F denotes the Frobenius norm.Furthermore, a projection from the Koopman space to the original space is desired.This is accomplished by using C lift ∈ R (N+N rbf ) to predict X(k) in the function space of X(k) from the function space of Z lift (k) , where X(k) = C lift Z lift (k) .The matrix C lift is obtained by minimizing � X(k) − C lift Z lift (k) � F .By stacking Z lift (k) and U(k), the least-squares solution is obtained as shown in Eq. ( 6).
where † is pseudo-inverse operatoration.Time series data comprising observables and inputs are obtained by simulating the model using random initial states and input sequences that satisfy the specified constraints.The process of Koopman linearization is summarized in Fig. 3a.Next, we adopt the MPC algorithm 17 for the design.
The optimization problem in MPC at time k can be formulated as follows where y r (k + j) ∈ R is the set point on reference trajectory y r (k + j) = T u (0)e −(k+i)•�t that decays exponentially from its initial value to the terminal value X k t after 6 weeks.We only use the terminal constraint on T u .The control input sequence at time k, denoted as u(k) := {u(k + j)} H p −1 j=0 , represents control actions over a prediction horizon.Control inputs beyond the control horizon ( H u ≤ j ≤ H p − 1 ) are set to zero.A single treatment's minimum and maximum dosage limits are denoted as u min and u max , respectively.The initial condition is X 0 .Weights Q ∈ R 4×4 and R ∈ R determine the penalties for state deviation and control effort.Q and R are positive semidefinite ( Q, R 0 ).H p is the prediction horizon and H u is the control horizon.The l 2 norm ( � • � ) measures state deviation and control effort.The diagonal weight matrix P ∈ R 4×4 represents the terminal cost weighting and is positive semidefinite ( P 0 ).P is obtained from a discrete-time algebraic Riccati equation , where i denotes the time index, and G i represents the i-th row of Due to the impulsive inputs, only the first in the control sequence U(k) is nonzero, resulting in all zero values from the second column onwards in G. Consequently, the cost function can be represented as: where H = G T QG + R , F = G T QW , and V = W T QW + Q .The optimal input sequence * is obtained by solving the optimization problem in (9) as follows (7)   min u(k) The Koopman operator utilizes spectral methods to transform the nonlinear model into a linear one.This process involves using time series data generated from the nonlinear model.(b) Linear MPC algorithm based on the linearized model.Constrained optimization is performed to obtain the optimal control sequence, which corresponds to the dosing scheme.This optimized sequence will then be implemented on the original nonlinear system.Figure adapted from our previous study 45 .
where b cs = [X max , −X min ] T , B cs = [−A i , A i ] T , and A cs = [G i , −G i ] T .The inequalities in (9) represent the con- straints and are handled using the active set method and Karush-Kuhn-Tucker conditions.The diagram of applying linear MPC is shown in Fig. 3(b).

Uncertainty analysis and sensitivity analysis
We employ uncertainty analysis and sensitivity analysis to elucidate the varying patient responses to therapy, shedding light on how parameter perturbations can reveal the impact of inherent variability in individuals undergoing cancer treatments.
Uncertainty analysis of the model is conducted to assess the robustness when faced with parameter uncertainty arising from limited data and patient variability.For model parameter uncertainty analysis, the model can be denoted as Y = f u (P 1 , P 2 , . . ., P n ) , relates the model output (Y) to a set of model parameters ( P i ), where n represents the number of parameters, and f u represents the function relating input parameters to the output.It is assumed that the model parameters follow normal distributions, expressed as , where µ i signifies the nominal value of P i , and a standard deviation of σ i = 0.1µ i is employed.For analysis of uncertain initial conditions, the potential inaccuracies in the initial values of the four states are caused by noisy measurements in imaging methods.For instance, immunohistochemistry has been reported to achieve up to 90% accuracy 49 .Thus, we assume a uniform distribution for the measured noisy initial conditions to address this.Specifically, if the measured initial state is denoted as X(0), the true initial conditions are assumed to follow a uniform distribution U[ X(0) 1.1 , X(0) 0.9 ].Variance-based sensitivity analysis is performed to assess parameter influence on the settling time of the controlled system using Sobol indices 50 .A model Y s = f s (P, x 0 ) has n changing parameters P = {par 1 , par 2 , . . ., par n } .The model output is decomposed into f s i (par i ) , f s ij (par i , par j ) , and so on, with corresponding variances V i , V ij , and higher-order terms.First-order indices S i and total-order indices S T i are calculated to evaluate parameter contributions, using Python package SAlib 51,52 .

Results
RMC and Koopman MPC were used to design the optimal dosing schemes for BCG treatment.The initial state [0.1, 0.1, 0, 0.8] was used for testing for both designs, indicating an early-stage bladder cancer with few effector T cells (i.e., E(0) = 0.1 × 10 6 ) and no infected tumor cells to begin with.The weights w 1 = 200 , = 1 , w N = 10 4 and w t = 10 5 were used in the cost function in Eq. ( 3).Also, the values of B(0), E(0), and T i (0) were relatively low compared to T u (0) to emphasize the decrease in T u .Once determined, the drug doses and the gap between treatments remained unchanged during the simulation.The three optimizers started with the same initial condition [d 0 , g 0 , N 0 ] = [2.2, 10, 3] , i.e., the lowest dose, the largest gap, and the least number of treatments, indicating the dosing scheme was initiated at the least amount of doses that might not be sufficient to eliminate cancerous cells.In PSO, we used w = c 1 = c 2 = 0.5 and the swarm size of 100 particles and obtained the best solution [d, g, N] = [5.46,5, 10] with a cost of 87.9 in one iteration.In Simulated Annealing, we set C r = 0.95 and initial temperature 1, and the best solution obtained was [d, g, N] = [6.4,5, 10] with a cost of 91.2, as shown in Fig. 4(a).In ACO, we used heuristics G, 1/D, and 1/N for priori information.An optimal solution [d, g, N] = [6.4,5, 10] with cost 91.2 was obtained, as shown in Fig. 4(b).Notably, this solution aligns with the best outcome obtained through simulated annealing.The results from three optimization algorithms showed that PSO quickly converged to an optimal solution in a single iteration, while ACO and Simulated Annealing required more iterations.( 9) PSO outperformed the others with the lowest-cost solution.Note that tuning parameters in the algorithms can potentially improve performance further and reduce costs.
Additionally, the optimal combination of g and N touched the boundaries of the constraints.This validates our design, as d and N tended to increase and g tended to decrease to obtain a more aggressive control action to eliminate cancerous cells.To verify if the solution is a global optimum or a local optimum, we expanded the variable boundaries for d ([2.2, 6.4] to [0, 50]), g ( [5, 10] to [1, 20]), and N ( [3, 10] to [1, 20]), which improved the solution to [3.22, 1, 20] with a lower cost of 82.2 after 1 iteration (by using same setting in PSO), in contrast to the original boundaries ( [d, g, N] = [5.46,5, 10] with a cost of 87.9).This suggests that the initial solution was a local minimum.However, certain constraints remain crucial for clinical feasibility and FDA guideline adherence, especially regarding safety with BCG doses.
The analysis of uncertain model parameters entails 200 Monte Carlo runs using a 95% confidence level.The results for the uncertain model parameters and uncertain initial conditions using the dosing scheme by RMC are visually depicted in the shaded regions within Fig. 5(a) and (b), respectively.We used the time that T u was brought to be below 1% of its initial value as a reference to assess the performance.For models with uncertain parameters, we observed an average time of 20.48 days with a standard deviation of 2.94 days.For models with uncertain initial conditions, the average time was 20.42 days with a standard deviation of 0.03 days.Figure 5(c)  and d show the results of uncertain model parameters and uncertain initial conditions for KMPC, which has an average time of 18.15 days with a standard deviation of 3.94 days, and an average time of 17.70 days with a standard deviation of 1.79 days, respectively.
In sensitivity analysis, we considered parameters in Table 1, assuming uniform distributions spanning 0.8 to 1.2 times their nominal values.The initial state x 0 was treated as a unified entity and only different scales of the same x 0 were considered, simplifying the analysis while examining its influence.The output Y u represents settling time.Figure 6 shows first-order and total-order Sobol indices from 1000 iterations.The parameter p 4 has total and first-order Sobol indices of 0.0064 and 0.0046, in contrast to more impactful parameters like p 2 , with total and first-order Sobol indices of 0.65 and 0.45.This shows the output settling time is not sensitive to the selection of numerical values of p 4 , as claimed earlier in the model section.
For Koopman MPC, the linear Koopman operator was approximated using the spectral method with 100 trajectories of time series data (generated from the nominal model with random initial conditions and inputs).Ten RBFs were tested with good performance and their centers were obtained by the K-means algorithm.The sampling time of 0.01 days was used, and the simulation length was 100 days.The delay of 1 was used for onestep time delay embedding.The control interval was set to 5 days, which is the same as the treatment interval in RMC for comparison.The accuracy of linearization was evaluated by comparing the model output (i.e., T u ) as shown in Fig. 7(a).The output of the Koopman-based linear model (in red) is closer to the model dynamic behavior (in yellow) for most of the time during the simulation, compared to local linearization using the truncated Taylor expansion, even though it had more deviation from the true dynamic behaviors at the beginning, indicating its good approximation performance over time.Linear MPC was applied to the Koopman-linearized model to optimize BCG dosing.The simulation involved the administration of BCG vaccines every 5 days, and a prediction horizon of 5 days was used to balance control performance and computational complexity.The control horizon was set to one sampling time.Weighting matrices Q = diag(0, 0, 0, 1000) and R = 0.1 were used to focus more on state errors.
Both dosing schemes effectively eliminated the uninfected tumor cells but differed in their treatment plans.Figure 7(b) compares the dosing schemes designed by RMC and Koopman MPC.For patients with identical initial conditions, RMC's dosing scheme administered smaller doses during the initial phase of treatment, followed by larger doses in the later phase, in contrast to Koopman MPC's approach.The aggressive control action in KMPC led to rapid elimination of cancerous cells, resulting in treatment completion in approximately 30 days.However, this approach may simultaneously induce more severe side effects, making it more suitable for patients who exhibit resistance or tolerance to high drug doses.On the other hand, the more gentle treatment plan illustrated in RMC resulted in a gradual reduction of tumor cells and a longer treatment duration.

Conclusion and discussion
In this study, we introduced RMC as a novel control strategy for designing optimal drug dosing schemes in the context of a BCG treatment model for bladder cancer, and compared its performance to the existing Koopman MPC approach.Both strategies effectively eliminated uninfected tumor cells but differed in their underlying principles.RMC optimized the impulsive BCG dosing scheme by balancing cumulative doses and cancerous cell populations.Using PSO, we found the optimal solution for the variables that defined the dosing scheme.We compare the modeling results of using RMC with data from clinical trials on the BCG efficacy in 53 and the side  effects in 54 from 1316 patients.These studies primarily employed the OncoTICE strain, characterized by containing 5 ×10 8 colony-forming units (c.f.u) per standard dose.Given that approximately 99% of the BCG departs from the bladder within 2 hours post-instillation 28 , a dose of 5.6 ×10 6 c.f.u of BCG, intended to be sustained within the bladder.The best dosing scheme in our study is [D, G, N] = [5.46,5, 10] .Then the dose from our solution is roughly equivalent to 0.975 standard doses, and the 5-day gap in our proposed dosing scheme signifies a more intensive instillation schedule compared to the typical weekly BCG instillations.In total 10 treatments result in a cumulative treatment duration of 50 days, which is close to the standard course consisting of 6 weekly treatments.Note that the maintenance treatments are not considered here.Both studies concluded that low-dose treatment (i.e., 1/3 of the standard dose) was associated with a higher recurrence rate compared to the standard dose.Nevertheless, no significant differences in observed toxicity were reported across the varying dosage levels.Given that our dosing scheme approximates the standard doses, it can be inferred that the recurrence rate is likely to remain low.It's worth noting that even a slightly shorter gap between instillations (from 7 to 5 days) could potentially yield different side effects.Consequently, evaluating potential side effects necessitates more extensive clinical trials for conclusive insights.Additionally, constraint relaxation is used to transform the hard constraints into soft constraints in the cost function.Exploring solutions that adhere to the hard constraints can also be achieved through mixed-integer optimization.For example, we can examine optimal solutions for d and g while setting N to values such as N = {5, 6, 7, 8, 9, 10} to satisfy N ∈ Z + .
Koopman MPC leveraged the linearized model based on the Koopman operator, which closely resembles the original nonlinear dynamics of the uninfected tumor cell population.This was in contrast to the Taylor approximation, which required operating points and might not always be available.The control performance exhibited a desired decrease in the uninfected tumor cell population with the impulsive drug dosing regimen employed by both controllers.The optimality of these strategies effectively balanced the therapeutic performance of the treatment with the physical constraints of the patients.Considering that most disease and treatment models in the real world are nonlinear and many drugs are administered impulsively, the Koopman-based impulsive MPC framework holds promise for application in various disease treatments due to its linearity and impulsivity.The Koopman linearized model offers improved accuracy compared to the linear approximation, as it does not rely on specific operating points for its formulation.
The two dosing schemes have relatively gentle and aggressive doses, as shown in the above.The main reason for RMC to provide gentle control inputs is that we kept the dose d constant throughout the entire time horizon.Conversely, KMPC exhibited more aggressive dosing due to the absence of a penalty on the �u = u(t + 1) − u(t) term, which restricts input increments in the cost function.To encourage a gentler dosing strategy akin to RMC, we can introduce the term || u|| 2 2 weighted by w u , where || • || 2 denotes the L2 norm.Besides, one notable advan- tage of RMC over Koopman MPC is its ability to handle nonlinear models without the need for linearization techniques directly.This feature facilitates the generalization of RMC to other treatment models with impulsive drug doses and constraints.For practical consideration, RMC is cost-effective for healthcare settings with limited patient data, as it operates in an open-loop manner without requiring extensive patient measurements.In contrast, KMPC, which relies on patient feedback and measurement updates to maintain model accuracy, is more expensive but excels in adapting to significant changes in a patient's condition.Note that KMPC is adapted to large changes in patient condition, as the uncertainty analysis already showed its robustness to small changes in model parameters.The choice between RMC and KMPC depends on the specific resource and adaptability requirements of the healthcare context.Additionally, in this study, the magnitude and gap of doses remained unchanged throughout the entire treatment.However, there is potential for further improvement by allowing these values to vary, potentially leading to lower values of the objective function.This flexibility would enable customization of the dosing scheme for each treatment, considering the specific needs of patients.
The integration of abundant molecular data from cancer patients offers great promise for advancing precision medicine.One potential approach involves identifying parameters influenced by gene expression variations related to cancer and treatment, then quantifying these relationships.For instance, genes like BCL2 and BAX, known apoptosis regulators 55 , may impact the cancer cell growth rate r.Similarly, the gene CD4 could contribute to the activation rate of the immune response p 4 due to BCG treatment-induced effects on effector T cells CD4 + 56 .Recent research analyzing genomic profiles in bladder cancer patients receiving BCG treatment has identified differentially expressed genes and signaling pathways 57 , potentially establishing quantitative links between gene expressions and model parameters.These connections could be clarified by linking parameter functions with relevant signaling pathways.
Overall, the reparameterization of the control process effectively addresses the nonlinearity of the model and enables the inclusion of multiple control objectives, including constrained optimization.The proposed control scheme has the potential for generalization to similar control problems with minimal modifications, streamlining the control design process and enhancing scalability.

Figure 2 .
Figure 2. (a) Diagram of the objective function in RMC.The multiobjective control designs the drug dosing scheme that minimizes the cumulative uninfected tumor cell population and the cumulative drug dosages.d is the dosage of a single treatment, and g is the gap between two consecutive treatments.x(g) is the state vector that contains uninfected tumor cell population T u (g) at time g.(b) The flowchart of PSO algorithm (adapted from 41 ).p best and g best are the individual and global optimum values of the particles.

Figure 3 .
Figure 3.The design of optimal drug dosing scheme using Koopman MPC.(a) Koopman linearization process.The Koopman operator utilizes spectral methods to transform the nonlinear model into a linear one.This process involves using time series data generated from the nonlinear model.(b) Linear MPC algorithm based on the linearized model.Constrained optimization is performed to obtain the optimal control sequence, which corresponds to the dosing scheme.This optimized sequence will then be implemented on the original nonlinear system.Figure adapted from our previous study45 .

Figure 4 .
Figure 4.The best solutions and the minimal cost obtained by using (a) the Simulated Annealing method and (b) the Ant Colony Optimization.They obtain the same optimal solution [d, g, N] = [6.4,5, 10] with the lowest cost of 91.2.Both of the optimization techniques tend to obtain solutions with lower doses and more treatments for the settings used in this study.

Figure 5 .
Figure 5. Uncertainty analysis with (a,b) RMC-derived inputs and (c,d) KMPC-derived inputs.The time that T u is brought to be below 1% of its initial value is used as the measurement.(a) Parameter uncertainty analysis shows an average time of 20.48 days with a standard deviation of 2.94 days.(b) Initial condition uncertainty analysis shows an average time of 20.42 days with a standard deviation of 0.03 days.(c) With inputs from KMPC, parameter uncertainty analysis shows an average time of 18.15 days with a standard deviation of 3.94 days.(d) With inputs from KMPC, initial condition uncertainty analysis shows an average time of 17.70 days with a standard deviation of 1.79 days.The shadow area represents the 95% confidence interval.The bar plots show the statistics of the time t for T u (t) < 0.01T u (0).

Figure 6 .
Figure 6.Sobol sensitivity analysis on model parameters.The first-order and the total-order Sobol indices are visualized, and the higher-order indices are ignored.The output settling time is sensitive to the changes in the parameters p 2 and µ 1 .

Figure 7 .
Figure 7. (a) The Koopman operator demonstrates effective linearization performance.Compared to the linear approximation (in yellow curve), Koopman linearization (in red curve) closely aligns the dynamics of the model observable with the output from the nonlinear model (in blue curve).(b) The comparison of the dosing schemes by RMC and Koopman MPC.RMC employs a gentle dosing scheme, resulting in a longer treatment duration.In contrast, Koopman MPC implements an aggressive control action with high doses administered for over half of the treatment period, leading to faster elimination of tumor cells.